Oversampling Method Based on Gaussian Distribution and K-Means Clustering

نویسندگان

چکیده

Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance recent years. When class distribution imbalanced, classical machine learning algorithms tend to move strongly towards majority disregard minority. Therefore, accuracy may be high, but model cannot recognize instances minority classify them, leading many misclassifications. Different methods have been proposed literature handle imbalance problem, most are complicated simulate unnecessary noise. In paper, we propose a simple oversampling method based on Multivariate Gaussian K-means clustering, called GK-Means. The new aims avoid generating noise control imbalances between within classes. Various experiments carried out with six classifiers four methods. Experimental results different datasets show that GK-Means outperforms other improves classification performance as measured by F1-score Accuracy.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification a...

متن کامل

Unsupervised Image Segmentation Method based on Finite Generalized Gaussian Distribution with EM & K-Means Algorithm

In Image Processing Model Based Image Segmentation plays a dominant role in Image Analysis and Image Retrieval . Recently much work has been reported regarding Image Segmentation based on Finite Gaussian Mixture Models using EM algorithm. (Yiming Wu et al (2003)) , (Yamazaki.T (1998)). However, in some images the pixel intensities inside the image regions may not be MesoKurtic or Bell Shaped, b...

متن کامل

Mini-model method based on k -means clustering

Mini-model method (MM-method) is an instance-based learning algorithm similarly as the k-nearest neighbor method, GRNN network or RBF network but its idea is different. MM operates only on data from the local neighborhood of a query. The paper presents new version of the MM-method which is based on k-means clustering algorithm. The domain of the model is calculated using k-means algorithm. Clus...

متن کامل

Enhanced Clustering Based on K-means Clustering Algorithm and Proposed Genetic Algorithm with K-means Clustering

-In this paper targeted a variety of techniques, tactics and distinctive areas of the studies that are useful and marked because the crucial discipline of information mining technologies. The overall purpose of the system of statistics mining is to extract beneficial facts from a large set of information and changing it right into a shape that is comprehensible for in addition use. Clustering i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computers, materials & continua

سال: 2021

ISSN: ['1546-2218', '1546-2226']

DOI: https://doi.org/10.32604/cmc.2021.018280